NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Automated Testing Linguistic Capabilities of NLP Models

https://doi.org/10.1145/3672455

Lee, Jaeseong; Chen, Simin; Mordahl, Austin; Liu, Cong; Yang, Wei; Wei, Shiyi (September 2024, ACM Transactions on Software Engineering and Methodology)

Natural language processing (NLP) has gained widespread adoption in the development of real-world applications. However, the black-box nature of neural networks in NLP applications poses a challenge when evaluating their performance, let alone ensuring it. Recent research has proposed testing techniques to enhance the trustworthiness of NLP-based applications. However, most existing works use a single, aggregated metric (i.e., accuracy) which is difficult for users to assess NLP model performance on fine-grained aspects, such as LCs. To address this limitation, we present ALiCT, an automated testing technique for validating NLP applications based on their LCs. ALiCT takes user-specified LCs as inputs and produces diverse test suite with test oracles for each of given LC. We evaluate ALiCT on two widely adopted NLP tasks, sentiment analysis and hate speech detection, in terms of diversity, effectiveness, and consistency. Using Self-BLEU and syntactic diversity metrics, our findings reveal that ALiCT generates test cases that are 190% and 2213% more diverse in semantics and syntax, respectively, compared to those produced by state-of-the-art techniques. In addition, ALiCT is capable of producing a larger number of NLP model failures in 22 out of 25 LCs over the two NLP applications.
more » « less
Full Text Available
RTL-Spec: RTL Spectrum Analysis for Security Bug Localization

https://doi.org/10.1109/HOST55342.2024.10545408

Miftah, Samit S; Kundu, Shamik; Mordahl, Austin; Wei, Shiyi; Basu, Kanad (May 2024, IEEE)

Full Text Available
ECSTATIC: Automatic Configuration-Aware Testing and Debugging of Static Analysis Tools

https://doi.org/10.1145/3597926.3604918

Mordahl, Austin; Soles, Dakota; Miao, Miao; Zhang, Zenong; Wei, Shiyi (July 2023, ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis)

Full Text Available
ECSTATIC: An Extensible Framework for Testing and Debugging Configurable Static Analysis

https://doi.org/10.1109/ICSE48619.2023.00056

Mordahl, Austin; Zhang, Zenong; Soles, Dakota; Wei, Shiyi (May 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE))

Full Text Available
An empirical assessment of machine learning approaches for triaging reports of static analysis tools

https://doi.org/10.1007/s10664-022-10253-z

Yerramreddy, Sai; Mordahl, Austin; Koc, Ugur; Wei, Shiyi; Foster, Jeffrey S.; Carpuat, Marine; Porter, Adam A. (March 2023, Empirical Software Engineering)

Full Text Available
The impact of tool configuration spaces on the evaluation of configurable taint analysis for Android

https://doi.org/10.1145/3460319.3464823

Mordahl, Austin; Wei, Shiyi (July 2021, Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis)
null (Ed.)
Full Text Available
SATune: A Study-Driven Auto-Tuning Approach for Configurable Software Verification Tools

https://doi.org/10.1109/ASE51524.2021.9678761

Koc, Ugur; Mordahl, Austin; Wei, Shiyi; Foster, Jeffrey S.; Porter, Adam A. (November 2021, 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE))

Full Text Available
Toward detection and characterization of variability bugs in configurable C software: an empirical study

https://doi.org/10.1109/ICSE-Companion.2019.00064

Mordahl, Austin (May 2019, Proceedings - International Conference on Software Engineering)

Variability in C software is a useful tool, but critical bugs that only exist in certain configurations are easily missed by conventional debugging techniques. Even with a small number of features, the configuration space of configurable software is too large to analyze exhaustively. Variability-aware static analysis for bug detection is being developed, but remains at too early a stage to be fully usable in real-world C programs. In this work, we present a methodology of finding variability bugs by combining variability-oblivious bug detectors, static analysis of build processes, and dynamic feature interaction inference. We further present an empirical study in which we test our methodology on two highly configurable C programs. We found our methodology to be effective, finding 88 true bugs between the two programs, of which 64 were variability bugs.
more » « less
Full Text Available
An Empirical Study of Real-World Variability Bugs Detected by Variability-Oblivious Tools

https://doi.org/10.1145/3338906.3338967

Mordahl, Austin; Oh, Jeho; Koc, Ugur; Wei, Shiyi; Gazzillo, Paul (August 2019, ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering)

Many critical software systems developed in C utilize compile-time configurability. The many possible configurations of this software make bug detection through static analysis difficult. While variability-aware static analyses have been developed, there remains a gap between those and state-of-the-art static bug detection tools. In order to collect data on how such tools may perform and to develop real-world benchmarks, we present a way to leverage configuration sampling, off-the-shelf “variability-oblivious” bug detectors, and automatic feature identification techniques to simulate a variability-aware analysis. We instantiate our approach using four popular static analysis tools on three highly configurable, real-world C projects, obtaining 36,061 warnings, 80% of which are variability warnings. We analyze the warnings we collect from these experiments, finding that most results are variability warnings of a variety of kinds such as NULL dereference. We then manually investigate these warnings to produce a benchmark of 77 confirmed true bugs (52 of which are variability bugs) useful for future development of variability-aware analyses.
more » « less
Full Text Available

Search for: All records